Closed-Form Supervised Dimensionality Reduction with GLMs
نویسندگان
چکیده
The problem of supervised dimensionality reduction is to combine learning a good predictor with finding a predictive structure, such as a low-dimensional representation which captures the predictive ability of the features while ignoring the “noise”. Indeed, performing dimensionality reduction simultaneously with learning a predictor often results into a better predictive performance than performing DR step separately from learning a predictor, as was demonstrated previously (e.g., see SVDM of [2], SDR-MM of [3], etc.), just as an embedded feature selection (e.g., sparse regression) often outperforms filter methods. However, existing SDR approaches are typically limited to specific settings. For example, SVDM [2] effectively assumes a Gaussian-noise data model when minimizing sum-squared reconstruction loss, and is restricted only to classification problems when using SVM-like hinge loss as its prediction loss. SDR-MM method of [3] treats various data types (e.g., binary and real-valued) but is again limited only to multi-class classification problems. Recent work on distance metric learning [6, 5] is also limited to Gaussian data assumption, and discrete-label (typically binary) classification problem [6, 5]. Indeed, a majority of supervised dimensionality methods can be viewed as jointly learning a particular (often just linear) mapping from the feature space to a low-dimensional hidden-variable space, as well as a particular classifier that maps the hidden variables to the class label. Our framework is more general as it treats both features and labels as exponential-family random variables, and allows to mix-and-match dataand label-appropriate generalized linear models, thus handling both classification and regression, with both discrete and real-valued data. It can be also viewed as a discriminative learning based on minimization of conditional probability of class given the hidden variables, while using as a regularizer the conditional probability of the features given the low-dimensional hidden-variable “predictive” representation. The main advantage of our approach, besides generalization to a wider range of SDR problems, is that it uses simple, closed-form update rules when performing its alternate minimization procedure, and does not require performing optimization at every iteration of the procedure. This method yields a really short Matlab code, fast performance, and is always guaranteed to converge (to a local minimum, just like most of the existing hidden-variable model learning approaches). The convergence property, as well as closed form update rules, follow from the use of auxiliary functions bounding each part of the objective function (i.e., reconstruction and prediction losses). We exploits the additive property of auxiliary functions in order to “stack” together multiple objectives and perform, in a sense, a “multi-way” DR, i.e. joint dimensionality reduction from several datasets, such as feature vectors X and label Y. More specifically, let X be an N × D data matrix with entries denoted Xnd where N is the number of i.i.d. samples, and n-th sample is a D-dimensional row vector denoted xn. Let Y be an N dimensional vector of class labels. We assume that our data points xn, n = 1, ..., N , are noisy versions of some “true points” θn which live in a low-dimensional space, and that this low-dimensional representation is predictive about the class. It is assumed that noise is applied independently to each coordinate of xn (i.e., that all dependencies among the dimensions are captured by low-dimensional representation), and that the noise follows exponential-family distributions with natural parameters θn, with possibly different members of the exponential family used for different dimensions. Namely, it is assumed that N×D parameter matrix Θ is a product of two low-rank matrices Θnd = ∑ l UnlVld where the rows of the L×D matrix V correspond to the basis vectors, and the columns of the N×L matrix U correspond to the coordinates of the “true points” Θn, n = 1, ...N in the L-dimensional space (for non-Gaussian noise, a nonlinear nonlinear surface in the original data space). We assuming exponential-family noise distribution for each Xnd with the corresponding natural parameter Θnd, i.e. log P (Xnd|Θnd) = XndΘnd − G(Θnd) + log P0(Xnd) where G(Θnd) is the cumulant
منابع مشابه
Semi-Supervised Dimensionality Reduction
Dimensionality reduction is among the keys in mining highdimensional data. This paper studies semi-supervised dimensionality reduction. In this setting, besides abundant unlabeled examples, domain knowledge in the form of pairwise constraints are available, which specifies whether a pair of instances belong to the same class (must-link constraints) or different classes (cannot-link constraints)...
متن کاملCovariance Operator Based Dimensionality Reduction with Extension to Semi-Supervised Settings
We consider the task of dimensionality reduction for regression (DRR) informed by realvalued multivariate labels. The problem is often treated as a regression task where the goal is to find a low dimensional representation of the input data that preserves the statistical correlation with the targets. Recently, Covariance Operator Inverse Regression (COIR) was proposed as an effective solution t...
متن کاملمدل ترکیبی تحلیل مؤلفه اصلی احتمالاتی بانظارت در چارچوب کاهش بعد بدون اتلاف برای شناسایی چهره
In this paper, we first proposed the supervised version of probabilistic principal component analysis mixture model. Then, we consider a learning predictive model with projection penalties, as an approach for dimensionality reduction without loss of information for face recognition. In the proposed method, first a local linear underlying manifold of data samples is obtained using the supervised...
متن کاملSupervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds
We propose “Supervised Principal Component Analysis (Supervised PCA)”, a generalization of PCA that is uniquely effective for regression and classification problems with high-dimensional input data. It works by estimating a sequence of principal components that have maximal dependence on the response variable. The proposed Supervised PCA is solvable in closed-form, and has a dual formulation th...
متن کاملLocally Linear Embedded Eigenspace Analysis
The existing nonlinear local methods for dimensionality reduction yield impressive results in data embedding and manifold visualization. However, they also open up the problem of how to define a unified projection from new data to the embedded subspace constructed by the training samples. Thinking globally and fitting locally, we present a new linear embedding approach, called Locally Embedded ...
متن کامل